A Proposal on Evaluation Measures for RTE
نویسنده
چکیده
We outline problems with the interpretation of accuracy in the presence of bias, arguing that the issue is a particularly pressing concern for RTE evaluation. Furthermore, we argue that average precision scores are unsuitable for RTE, and should not be reported. We advocate mutual information as a new evaluation measure that should be reported in addition to accuracy and confidence-weighted score.
منابع مشابه
Textual Entailment as an Evaluation Framework for Metaphor Resolution: A Proposal
We aim to address two complementary deficiencies in Natural Language Processing (NLP) research: (i) Despite the importance and prevalence of metaphor across many discourse genres, and metaphor’s many functions, applied NLP has mostly not addressed metaphor understanding. But, conversely, (ii) difficult issues in metaphor understanding have hindered large-scale application, extensive empirical e...
متن کاملDevelopment and Usability Evaluation of an Online Tutorial for “How to Write a Proposal” for Medical Sciences Students
Background and Objective: Considering the importance of learning how to write a proposal for students, this study was performed to develop an online tutorial for “How to write a Proposal” for students and to evaluate its usability. Methods: This study is a developmental research and tool design. “Gamified Online Tutorial based on Self-Determination Theory (GOT-STD) Framework" became the basis f...
متن کاملThe usefulness of transient elastography, acoustic-radiation-force impulse elastography, and real-time elastography for the evaluation of liver fibrosis
BACKGROUND/AIMS Several noninvasive methods have recently been developed for the evaluation of liver fibrosis. The accuracy of transient elastography (TE), acoustic-radiation-force impulse (ARFI) elastography, and real-time elastography (RTE) in predicting liver fibrosis were evaluated. METHODS Seventy-four patients who had undergone a liver biopsy within the previous 6 months were submitted ...
متن کاملSPARTE, a Test Suite for Recognising Textual Entailment in Spanish
The aim of Recognising Textual Entailment (RTE) is to determine whether the meaning of a text entails the meaning of another text named hypothesis. RTE systems can be applied to validate the answers of Question Answering (QA) systems. Once the answer to a question is given by the QA system, a hypothesis is built turning the question plus the answer into an affirmative form. If the text (a given...
متن کاملAnálise de Medidas de Similaridade Semântica na Tarefa de Reconhecimento de Implicação Textual (Analysis of Semantic Similarity Measures in the Recognition of Textual Entailment Task)[In Portuguese]
In this work, we present a feature-based approach to the RTE (Recognizing Text Entailment) task that verifies the similarity between two sentences including syntactic and semantic aspects. The selected features come from the winning work of the RTE task of the workshop ASSIN (Semantic Similarity Evaluation and Textual Inference) with some changes and addition of other semantic feature. The eval...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009